Comparison of non-invasive, scalp-recorded auditory steady-state responses in humans, rhesus monkeys, and common marmosets

Auditory steady-state responses (ASSRs) are basic neural responses used to probe the ability of auditory circuits to produce synchronous activity to repetitive external stimulation. Reduced ASSR has been observed in patients with schizophrenia, especially at 40 Hz. Although ASSR is a translatable biomarker with a potential both in animal models and patients with schizophrenia, little is known about the features of ASSR in monkeys. Herein, we recorded the ASSR from humans, rhesus monkeys, and marmosets using the same method to directly compare the characteristics of ASSRs among the species. We used auditory trains on a wide range of frequencies to investigate the suitable frequency for ASSRs induction, because monkeys usually use stimulus frequency ranges different from humans for vocalization. We found that monkeys and marmosets also show auditory event-related potentials and phase-locking activity in gamma-frequency trains, although the optimal frequency with the best synchronization differed among these species. These results suggest that the ASSR could be a useful translational, cross-species biomarker to examine the generation of gamma-band synchronization in nonhuman primate models of schizophrenia.


Materials and methods
Subjects and participants. In this experiment, we used five adult common marmosets (Callithrix jacchus, two female and three male, 5-8 years old, 272-342 g) and five young adult rhesus monkeys (Macaca mulatta, five female, 6-12 years old, 5.2-7.2 kg) for EEG recordings. All marmosets and monkeys were born and housed in temperature-controlled colonies at the Primate Research Institute, Kyoto University. They were maintained on a standard 12-h light/dark cycle. The marmosets and monkeys were fed monkey chow and supplements, such as apples and sweet potatoes. Water was available ad libitum. Experiments were conducted in a sound-attenuated box placed in an experimental room. All experiments for marmosets and monkeys were approved by the Animal Experimentation Committee of the Primate Research Institute of Kyoto University (No. 2019-024, 2020-089 for the scalp electroencephalogram (EEG) recording, No. 2019-025, 2020-111 for the electrocorticography (ECoG) recording) and were conducted in accordance with the Guide for Care and Use of Laboratory Primates published by the Primate Research Institute, Kyoto University, and ARRIVE guidelines (https:// arriv eguid elines. org/). For human experiments, five healthy Japanese young adult volunteers participated (four female and one male, 24-27 years old). All participants were right-handed and had no neurological or psychiatric history. They were recruited from the Primate Research Institute, and written informed consent was obtained from each participant. All experiments were performed in accordance with the Declarations of Helsinki and the guidelines approved by Kyoto University. This study was approved by the Human Research Ethics Committee of the Primate Research Institute (No. 2019-09).
Stimulus. The auditory stimuli were generated by a custom script written in MATLAB (MathWorks Inc., Natick, USA) and saved as auditory files 48 kHz). The stimuli were played on a computer using Audacity (https:// www. audac ityte am. org/). We used a sound interface (sound AV amplifier DSP-AX459, YAMAHA, JAPAN) for the human experiments and sound card (sound blaster Audigy FX, Creative Technology, JAPN) for the animal experiments. For the standard condition, each train contains 1-ms clicks at one of three frequencies (30,40,or 80 Hz). The duration of each train was 500 ms, and the inter-train interval (ITI; 501-1100 ms from stimulus onset) was 600 ms. Three possible frequencies were presented in a pseudorandom manner. Each block contained 250 trains for each frequency, with 750 trains in total. For the wideband condition, each train contains 1-ms clicks at one of the 11 frequencies (20 to 120 Hz, 10 Hz step). The duration and ITI were the same as those in the standard condition. The trains of 11 possible frequencies were presented in a pseudorandom manner. Each block contained 100 trains for each frequency: 1100 trains in total. Each subject performed two blocks of the standard condition and three blocks of the wideband condition. Each recording block took approximately 14 min for the standard condition and 18 min for the wideband condition. The ITIs were jittered in 8.3 ms (a 1/2 cycle of 60 Hz), meaning that if power line noise at 60 Hz was contaminated, the averaged noise was canceled out because the phase was shifted from trial to trial.
Apparatus. Prior to EEG recording, marmosets and monkeys underwent a head-shaving procedure under anesthesia and the preparation of head-fitting masks. Each marmoset received an intramuscular injection of alfaxalone (Alfaxaone® 4 or 5 mg/kg, Meiji Seika Pharma) and atropine 0.05 mg/kg. They were supplied with 1:1 oxygen-air mixture (1 L/min) through the mask, and pulse rate and SpO 2 levels were monitored using pulse oximeters. Each monkey was anesthetized using intramuscular injection of ketamine (Ketalar® 2.5 mg/kg, Daiichi Sankyo Propharma) and medetomidine (0.1 mg/kg). After the procedures were done, the anesthesia was antagonized with atipamezole (0.5 mg/kg, Orion Pharma).
During the experiment, the marmoset or monkey was seated in a primate chair and its head was restricted with a uni-frame thermoplastic mask (Toyo Medic, Tokyo, Japan) in a sound-attenuated room. The stimuli were delivered by a speaker placed 30 cm from the head and controlled at 65-75 dB sound level, measured in a near position of the ear. For the human experiments, the participant was seated in a chair and listened to the same stimuli presented in the animal experiments via a speaker placed 90 cm from the head.
The post-processing of the EEG data was performed using EEGLAB 38 and ERPLAB 39 . The EEG data were rereferenced to the linked ear reference and high-pass filtered (2 Hz). Then, epochs from 100 ms before to 1000 ms after stimulus onset were created. The data were baseline-corrected to the mean of the 100-ms pre-stimulus period, artifacts were removed using a rejection criterion of ± 150 μV, and averaged for each subject. Finally, group-averaged waveforms were obtained by averaging the ASSR of five subjects for each species. Event-related spectral perturbation (ERSP) was calculated to visualize mean event-related change in spectral power over time in a broad frequency using the formula (1). F k f , t is the spectral estimate of trial k at frequency f and time t.
Time-frequency decomposition was applied to the activities using sinusoidal wavelet transforms, three cycles in length at the lowest frequency (10 Hz), increasing linearly with frequency up to 32 cycles at the highest frequency (120 Hz).
Inter-trial coherence (ITC) is a measure of phase consistency over trials, corresponding to the phase-locking factor, which was defined previously 40 . ITC takes values between 0 and 1. An ITC value close to 0 reflects high variability of phase angles across trials, whereas a value near 1 indicates perfect synchronization between EEG data and the time-locking event over trials 38 .
We calculated the mean ITC values (mITCs) during a 400-ms period of train presentation (101-500 ms from stimulus onset) and a 400-ms period of baseline (601-1000 ms from stimulus onset) in each frequency condition. The mITCs were calculated in the band of 5 Hz lower and higher than the target frequency. Based on the ITC map, brain activity in the initial 100 ms period of train presentation was excluded because the activity may include responses other than ASSR, such as the onset responses, and that in the initial 100 ms period of baseline was also excluded because the offset responses were involved. To investigate whether the mean ICT values during the train presentation were larger than those during baseline, we conducted a one-tailed paired t-test to compare the mITCs between the two periods at each frequency for each species.
In addition, to test the differences of the mITC values among three species, we performed a two-way ANOVA with stimulus frequencies as a within-subject factor and species as a between-subject factor.
ECoG data acquisition and analysis in common marmosets. To determine whether filtering some specific components from the brainwave with the skull or subcutaneous tissue produces the frequency characteristics of the ASSR recorded from the scalp, we used two adult female common marmosets (8 and 7 years old, body weights 380 and 400 g) for the ECoG recordings. These marmosets were not used for EEG recording. The animals were anesthetized with ketamine (3 mg/kg), xylazine (1 mg/kg), and atropine (0.05 mg/kg), and the anesthesia was maintained by inhalation of isoflurane (~ 2%). We implanted a 16-channel probe (E16-500-5-200, NeuroNexus, USA) into the epidural space, and the connector was attached to the skull using steel screws According to the International 10-20 system, the electrodes were placed with Fz, Cz, Pz, Oz, A1, and A2 in the same locations across all subjects, regardless of the species. The ground electrode was set at C3 or C4. www.nature.com/scientificreports/ and dental cement. The size of the electrode array was 1.8 mm × 2.0 mm. The electrode array of one marmoset was located in the left primary auditory cortex, and another marmoset was implanted in the left prefrontal cortex. The ground and reference electrodes were placed in the epidural space. During the experiment, the awake marmoset was seated in a primate chair and its head was not restricted. ECoG data were recorded at a sampling rate of 1 kHz using a neural recording data acquisition system (OmniPlex, Plexon, Dallas, USA). The experimental apparatus and stimuli were the same as those used for scalp recording. We recorded the neural data during the two sessions of ASSR on a day for both marmosets. The procedures and parameters of analysis of the ECoG data were the same as those of the scalp EEG data described above.

Results
Scalp EEG recording. We recorded scalp EEGs from humans, rhesus monkeys, and common marmosets. Figure 2 shows the grand-averaged ERPs to 30 (Fig. 2a), 40 (Fig. 2b), and 80 Hz (Fig. 2c) click trains for all three species. After the onset of train presentation, ERP showed periodic waves of the same frequency as the click train in all species. For humans, the periodicity is prominent in the 40-Hz click trains compared to the 30-and 80-Hz trains. As shown in Fig. 2b, the periodic waves continued during the train presentation and then disappeared after the train offset. On the other hand, the periodicity is also clear for the 30-and 80-Hz trains for monkeys and marmosets ( Fig. 2a and c). For marmosets, the ERP to the 30 Hz train in the Fz electrode shows approximately 13 waves during the initial 500 ms of train presentation. These waves consisted of a 30 Hz ASSR (Fig. 2a). In addition, each wave has a notch, which indicates that the brain responds to harmonics to the train at 60 Hz. Similarly, harmonic activity to the 40 Hz train was observed in the marmoset's Fz electrode (Fig. 2b).
Remarkably, the ERP of marmosets in the Fz electrode shows high synchronization to the 80 Hz train (Fig. 2c). The response to the onset of the train seemed to differ across the three species. The onset responses were quick and prominent in rhesus monkeys, especially in Fz and Cz, whereas they were modest in humans for all frequencies. For marmosets, the first response was weak, with a very short latency. The peak latency of the first response was the shortest in marmoset (11.1 ± 0.7 ms), intermediate in rhesus monkeys (24.8 ± 1.8 ms), and longest in humans (81.6 ± 11.6 ms). There were significant main effects of species on the peak latency of the first response (two-way ANOVA, F (2,10) = 3.49, p < 0.001), whereas there was no effect of stimulus frequency (F (2,10) = 2.35, p = 0.49).
The grand-averaged time-frequency plots of ERSP and ITC for 40 Hz stimuli from the Cz electrode in all three species are shown in Fig. 3. The evoked responses were normalized based on a pre-stimulus baseline (− 200 to 0 ms from stimulus onset). The ERSP plots show a remarkable increase in spectral power at 40 Hz during the 40 Hz train presentation for humans. Similarly, the ITC plot reveals that the horizontal band at 40 Hz, indicating high synchronization to the train stimulus, was prominent. These bands promptly disappeared when the presentation of stimuli was stopped 500 ms after stimulus onset. Increasing spectral power and high synchronization at 40 Hz during stimulus presentation were also observed in rhesus monkeys and marmosets. Unlike in humans, synchronization at 80 and 120 Hz, in addition to 40 Hz, was clearly observed in the ITC maps of rhesus monkeys and marmosets. These results indicate that the brain responses to harmonics of the train were obvious for rhesus monkeys and marmosets.
To evaluate the phase-locking activity synchronized with the repetitive stimuli, the mITCs were calculated and compared during train presentation and those during baseline period under the wideband condition. Figure 4 shows the mITCs calculated from the Cz electrode and time-frequency analysis of ITC for each frequency of stimulus in humans (Fig. 4a), rhesus monkeys (Fig. 4b), and common marmosets (Fig. 4c). For humans, the mITCs during train presentation were significantly larger than those during baseline period for almost all frequencies (P < 0.05), except for the 20 Hz (t(4) = 1.36, P = 0.12) and 60 Hz (t(4) = 1.99, P = 0.06) trains. In addition, another peak was observed at 90 Hz (t(4) = 4.28, P = 0.006), showing a bimodal form with high synchronization. The ITC map for each stimulus frequency shows phase-locking activity at the stimulus frequency, as shown by the results for 40 Hz trains in Fig. 3. On the other hand, for rhesus monkeys, the mITCs during the train presentation were significantly larger than those during baseline period for all frequencies with no prominent preference for a specific frequency (P < 0.01). Although the mITCs were greatest at 50 Hz (t(4) = 4.09, P = 0.007) and 80 Hz (t(4) = 4.54, P = 0.005), there were only slight differences in the mITCs. For marmosets, the mITCs during the train presentation were significantly larger than those during baseline period for almost all frequencies (P < 0.05), except for the 60 Hz (t(4) = 1.21, P = 0.15) and 70 Hz (t(4) = 1.95, P = 0.06) stimulation. The bimodal pattern of the mITCs with one peak at 40 Hz (t(4) = 3.26, P = 0.02) and another at 100 Hz (t(4) = 8.67, P = 0.0005) was similar to that in humans, but the marmosets showed higher synchronization at higher gamma frequencies (more than 80 Hz).
The differences of the mITC were examined by a two-way ANOVA with stimulus frequencies as a withinsubject factor and species as a between-subject factor. No significant main effects of both stimulus frequencies and species were observed in the mITCs (F(10, 120) = 3.79, P = 0.15 and F(2, 12) = 1.71, P = 0.22, respectively). The interaction between stimulus frequencies and species was statistically significant (F(20, 120) = 6.741, P = 0.04). Since the interaction was significant, we concluded that there may be a species difference in the frequency response characteristics. However, a post-hoc analysis showed no significant difference. This may be due to the small sample size. Therefore, we performed a nonparametric test with correction for multiple comparisons to examine the mITC differences by stimulus frequency on each species. We found that a significant difference among stimulus frequencies in humans ( www.nature.com/scientificreports/ the 20-120 Hz stimulation were observed in common marmosets and rhesus monkeys like humans, but they showed different patterns of ASSR from low to high gamma frequencies. In addition to the phase-locking activity, a low-frequency transient response of approximately 10-40 Hz immediately after stimulus onset was observed in humans and rhesus monkeys (ITC maps in Fig. 4). In marmosets, a similar component is approximately 40-80 Hz. These components in the ICT maps correspond to the neural responses to the onset of the train observed in the averaged ERP (Fig. 2). Furthermore, low-frequency www.nature.com/scientificreports/ responses around the train offset were observed in ITC maps with stimulus frequencies above 60 Hz, especially for human and rhesus monkeys. This seems to be a neural response to the stimulus offset (ITC maps in Fig. 4).
ECoG recording. We also recorded the ASSR using ECoG from the auditory cortex and the prefrontal cortex of other two common marmosets using the same stimuli and compared it with the results of scalp recordings. Figure 5 shows the mITCs (upper panel), which were calculated from the ECoG data from the primary auditory cortex. In both the auditory cortex and prefrontal cortex (not shown), the mITCs for all frequencies showed similar bimodal patterns. The frequencies of the peaks were slightly different, but there was one peak at low frequencies (40)(41)(42)(43)(44)(45)(46)(47)(48)(49)(50) and another peak at high frequencies (above 80 Hz). The time-frequency maps of ERSP and ITC for 50 and 100 Hz stimuli are shown in Fig. 5. A higher phase-locking activity was observed in high-gamma band stimuli. These features are essentially the same as those of scalp data.

Discussion
We recorded the ASSR synchronized to gamma-band stimuli from humans, rhesus monkeys, and common marmosets using the same non-invasive method. The ERPs at each stimulus frequency showed periodic activity synchronized with the stimulus frequency. The spectrogram also showed a peak of power consistent with each stimulus frequency. In addition, the ITC, which reflects the temporal and spectral synchronization and captures the consistency in phase alignment of neuronal activity that is elicited by auditory stimuli, was larger during the stimulus presentation period than during baseline period without stimulus in all three species. These results indicate that ASSRs were present in rhesus monkeys and common marmosets as well as in humans using the same scalp recording technique. Most studies of ERPs in monkeys have used invasive techniques, including brain surgery, to implant electrodes inside the skull [41][42][43][44][45] . In recent years, several studies have reported the use of noninvasive methods for recording electroencephalography (EEG) in monkeys [46][47][48][49][50] . We have previously demonstrated the non-invasive scalp recording of cortical auditory evoked potentials (CAEPs) in alert rhesus monkeys 28,29 and common marmosets 30 . Such non-invasive scalp recordings of non-human primates are promising because they can be directly compared with the non-invasive EEG measurements routinely performed in humans. In addition, noninvasive EEG measurements allow EEG data to be obtained without surgically damaging valuable and limited numbers of primate models of psychiatric diseases. In the current study, we applied this method to the ASSR, and, to the best of our knowledge, this is the first report of ASSR in non-human primates using a non-invasive method. The ASSRs for the gamma frequencies are thought to be generated from the auditory cortex and subcortical regions 21,51-53 . A neurophysiological study has reported that the neurons in the auditory cortex of anesthetized monkeys synchronize their firing rate to the individual click in trains up to 100 Hz 42 . However, higher frequencies are thought to originate from the brainstem areas 53 . The inferior colliculus may play an especially important role in the generation of 80-Hz responses. In the current study, we recorded higher synchronized activity to trains with high frequencies, particularly in marmosets. The marmoset's head is small, so that the neuronal activity in the inferior colliculus may be easily captured by electrodes on their scalp. Nevertheless, we demonstrated that human participants also showed significantly high phase-locking activity at high gamma trains. The results suggest an important role of ASSR in the high-gamma stimuli.
Comparison among three species. We found a difference in neuronal entrainment to auditory trains with 10-120 Hz frequency among humans, rhesus monkeys, and common marmosets. Humans and marmosets showed one peak of high synchronization to a 40 Hz train and another peak to high gamma trains (90 and 100 Hz), which is a similar bimodal pattern. For the response to high gamma trains (> 80 Hz), the phase-locking www.nature.com/scientificreports/ activity was large in marmosets. On the other hand, rhesus monkeys showed high synchronization to all frequencies, but no obvious preference for specific stimuli. There are several possible reasons for these differences among primate species. These three species differ in their characteristics of basic auditory function-in their audible frequency ranges, most sensitive parts of the audiogram 35,37,[54][55][56][57] , and frequency content of their vocalizations [58][59][60][61] . These differences may partly explain our current findings indicating that the ITC of the response to stimuli at 40 Hz was the largest in healthy humans, whereas the maximal coherence in rhesus monkeys was at 50 Hz and 100 Hz in marmosets. Alternatively, the frequency shift may also be dependent on brain volume 62  Harmonics. Harmonic activities were observed in the time-frequency map during train presentation at lower frequencies (Figs. 3 and 4). This is because we used the click train as a stimulus that has harmonic spectral cues and activates several tonotopic areas corresponding to each frequency, whereas the modulation frequency of amplitude-modulated tones has only one peak at the stimulus frequency 64 . These harmonic activities were more prominent in marmosets than in rhesus monkeys and humans. A previous study has reported that macaque monkeys showed no preference for harmonic sounds 65 . In contrast, neurons that sensitive to the combination of temporal envelope and spectral cues were recorded in marmoset auditory cortex 66 . Furthermore, the resolvability of harmonics in marmosets is different from that in humans 67 . These results are consistent with our results showing strong harmonic activity in marmosets. Harmonic activity in marmosets may be a clue to the evolution of human speech.
Comparison with ECoG data. We found that the ASSRs recorded from the scalp EEG showed the highest synchrony with the auditory stimuli at high gamma frequency, and the result was the same as the ECoG data www.nature.com/scientificreports/ recorded from the primary auditory cortex (Fig. 5) and the prefrontal cortex (not shown). Scalp EEG can detect signals when the cortex is active in a synchronized manner over 10 cm 2 in humans 68,69 , whereas the spatial resolution of the ECoG was higher (approximately < 5 mm 2 in humans) 68 . The activity at a high frequency was difficult to record on the skull because the signals were obscured by electrical artifacts and scalp EMG 70,71 , and the skull acts as a low-pass filter. Based on these characteristics of EEG recordings, we showed the results recorded by ECoG to demonstrate that the features of the ASSRs recorded from EEG are not affected by distance from the signal source or attenuation of certain frequencies. The results suggest that the characteristics of ASSR obtained from scalp recording are sustainable.
Transient responses. The measurement of ASSRs allows us to evaluate not only the synchronous activation of neuronal ensembles but also the transient response to auditory stimuli 19 . We found CAEP as a response to the onset of each train. The response to the first stimulus was more apparent than the responses to the other stimuli, but it was partly obscured by the periodic waveform of the ASSR. The first transient response has a positive peak and a robust negative peak; therefore, the response is thought to be homologous to that of P1 and N1 in humans. The responses were prominent in Fz and Cz, reflecting that the CAEP originates in the first stages of temporal integration of thalamocortical inputs to the cerebrum and following a higher level of auditory processing in the auditory cortex. The peak latencies of the first positive responses were longest in humans (81.6 ± 11.6 ms), intermediate in rhesus monkeys (24.8 ± 1.8 ms), and shortest in marmosets (difficult to define). These data are consistent with previous reports [72][73][74] .
The generation of gamma-band synchronization has been suggested to be dependent on the network of GABAergic interneurons that act as pacemakers and produce rhythmic inhibitory postsynaptic potentials in pyramidal neurons 9,14,79 . In addition to GABAergic dysfunction, loss of N-methyl-D-aspartic acid (NMDA) receptor function in inhibitory parvalbumin interneurons leads to abnormal hyperactivity of excitatory neurons, which results in impaired synchronization to external stimuli. Thus, ASSR reduction in schizophrenia reflects the dysfunction of GABAergic and/or NMDA receptors. In the current experiment, we demonstrate that ASSRs are available in non-human primates to evaluate gamma-band synchronization to external stimuli by the scalp recording method. These methods enable us to use ASSR as a translatable biomarker between non-human primate models of schizophrenia and patients. Many EEG studies in patients with schizophrenia have focused on ASSR to lowergamma frequencies 11,15,17,18,75 . However, as described above, distinct neural networks have been suggested to be involved in low-and high-gamma responses 53 . A magnetoencephalography (MEG) study has reported that the power of ASSRs at 40 Hz and 80 Hz was decreased in patients with schizophrenia 19 . Furthermore, another MEG study reported a reduction in power and phase-locking activity at 80 Hz ASSR in patients with schizophrenia, and the lower 80 Hz ASSR was related to hallucination but not related to the reduction of the power of ASSR at 20 or 30 Hz 21 . These results suggest that abnormal high-gamma synchronization is related to positive symptoms of schizophrenia, and the dysfunction of ASSR at high-gamma frequencies may be more involved in the pathogenesis of schizophrenia than previously thought. In the future, it may be possible to examine abnormalities in different brain regions and neural networks in patients with schizophrenia by measuring and analyzing the ASSR at low and high gamma frequencies.
In summary, at least for ASSR recording of marmosets, non-invasive recording from the scalp is considered sufficient because the auditory response characteristics are not altered when compared to ECoG, which is an invasive intracranial implantation of electrodes. This method allows us to directly compare brain activity as a potential biomarker, including the ASSR, P300, N50, and mismatch negativity, for neuropsychiatric diseases between humans and non-human primate models. Further experiments are required to confirm this hypothesis.

Data availability
The data that support the findings of this study are available from the corresponding author upon reasonable request.